A coupled HMM approach to video-realistic speech animation
نویسندگان
چکیده
We propose a coupled hidden Markov model (CHMM) approach to video-realistic speech animation, which realizes realistic facial animations driven by speaker independent continuous speech. Different from hidden Markov model (HMM)-based animation approaches that use a singlestate chain, we use CHMMs to explicitly model the subtle characteristics of audio–visual speech, e.g., the asynchrony, temporal dependency (synchrony), and different speech classes between the two modalities. We derive an expectation maximization (EM)-based A/V conversion algorithm for the CHMMs, which converts acoustic speech into decent facial animation parameters. We also present a video-realistic speech animation system. The system transforms the facial animation parameters to a mouth animation sequence, refines the animation with a performance refinement process, and finally stitches the animated mouth with a background facial sequence seamlessly. We have compared the animation performance of the CHMM with the HMMs, the multi-stream HMMs and the factorial HMMs both objectively and subjectively. Results show that the CHMMs achieve superior animation performance. The ph-vi-CHMM system, which adopts different state variables (phoneme states and viseme states) in the audio and visual modalities, performs the best. The proposed approach indicates that explicitly modelling audio–visual speech is promising for speech animation. 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
منابع مشابه
Text Driven 3D Photo-Realistic Talking Head
We propose a new 3D photo-realistic talking head with a personalized, photo realistic appearance. Different head motions and facial expressions can be freely controlled and rendered. It extends our prior, high-quality, 2D photo-realistic talking head to 3D. Around 20-minutes of audio-visual 2D video are first recorded with read prompted sentences spoken by a speaker. We use a 2D-to-3D reconstru...
متن کاملHMM-based motion trajectory generation for speech animation synthesis
Synthesis of realistic facial animation for arbitrary speech is an important but difficult problem. The difficulties lie in the synchronization between lip motion and speech, articulation variation under different phonetic context, and expression variation in different speaking style. To solve these problems, we propose a visual speech synthesis system based on a fivestate, multi-stream HMM, wh...
متن کاملSpeech-driven facial animation using a hierarchical model - Vision, Image and Signal Processing, IEE Proceedings-
A system capable of producing near video-realistic animation of a speaker given only speech inputs is presented. The audio input is a continuous speech signal, requires no phonetic labelling and is speaker-independent. The system requires only a short video training corpus of a subject speaking a list of viseme-targeted words in order to achieve convincing realistic facial synthesis. The system...
متن کاملVisual speech synthesis from 3D video
Data-driven approaches to 2D facial animation from video have achieved highly realistic results. In this paper we introduce a process for visual speech synthesis from 3D video capture to reproduce the dynamics of 3D face shape and appearance. Animation from real speech is performed by path optimisation over a graph representation of phonetically segmented captured 3D video. A novel similarity m...
متن کاملSpeech and Expression Driven Animation of a Video-Realistic Appearance Based Hierarchical Facial Model
We describe a new facial animation system based on a hierarchy of morphable sub-facial appearance models. The innovation in our approach is that through the hierarchical model, parametric control is available for the animation of multiple sub-facial areas. We animate these areas automatically both from speech to produce lip-synching, and natural pauses and hesitations and using specific tempora...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 40 شماره
صفحات -
تاریخ انتشار 2007